BM25 for Non-Textual Modalities in Social Book Search
نویسنده
چکیده
The Social Book Search (SBS) lab at CLEF 2016 provides a complex test collection that gives the opportunity to experiment with retrieval methods that combine various modalities in order to achieve the best possible ranked list. We show how the idea of being ”characteristic”, which is used as the core concept in most of the weighting schemes used for textual modalities, can be applied to non-textual modalities. Our approach re-defines BM25 for the three non-textual modalities found in the SBS collection: ratings, price and number of pages. A fuzzy query is constructed from the user preferences inferred from the user’s catalog. The results are used to re-rank a textual baseline, which significantly improves the retrieval effectiveness.
منابع مشابه
Multimodal Social Book Search
Today’s information retrieval applications have become increasingly complex. The Social Book Search (SBS) lab at CLEF 2015 allows evaluating retrieval methods on a complex search task with several textual and non-textual meta-data fields. The challenge is to incorporate the different information types (modalities) into a single ranked list. We build a strong textual baseline and combine it with...
متن کاملThe Probabilistic Relevance Framework: BM25 and Beyond
The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970–1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Aga...
متن کاملUniversity of Santiago de Compostela at CLEF-IP09
In this paper we describe our participation in CLEF-IP 2009 (prior-art search task). This was the first year of the task and we focused on how to build effectively a prior art query from a patent. Basically, we implemented simple strategies to extract terms from some textual fields of the patent documents and gave preference to title terms. We ran experiments with standard BM25 configurations a...
متن کاملExploration of Proximity Heuristics in Length Normalization
Ranking functions used in information retrieval are primarily used in the search engines and they are often adopted for various language processing applications. However, features used in the construction of ranking functions should be analyzed before applying it on a data set. This paper gives guidelines on construction of generalized ranking functions with applicationdependent features. The p...
متن کاملFormulating Good Queries for Prior Art Search
In this paper we describe our participation in CLEF-IP 2009 (prior art search task). This was the first year of the task and we focused on how to build effectively a prior art query from a patent. Basically, we implemented simple strategies to extract terms from some textual fields of the patent documents and gave more weight to title terms. We ran experiments with the well-known BM25 model. Al...
متن کامل